Sparse Non-negative Matrix Language Modeling
نویسندگان
چکیده
منابع مشابه
Sparse Non-negative Matrix Language Modeling
We present Sparse Non-negative Matrix (SNM) estimation, a novel probability estimation technique for language modeling that can efficiently incorporate arbitrary features. We evaluate SNM language models on two corpora: the One Billion Word Benchmark and a subset of the LDC English Gigaword corpus. Results show that SNM language models trained with n-gram features are a close match for the well...
متن کاملSparse non-negative matrix language modeling for skip-grams
We present a novel family of language model (LM) estimation techniques named Sparse Non-negative Matrix (SNM) estimation. A first set of experiments empirically evaluating these techniques on the One Billion Word Benchmark [3] shows that with skip-gram features SNMLMs are able to match the state-of-theart recurrent neural network (RNN) LMs; combining the two modeling techniques yields the best ...
متن کاملPruning sparse non-negative matrix n-gram language models
In this paper we present a pruning algorithm and experimental results for our recently proposed Sparse Non-negative Matrix (SNM) family of language models (LMs). We show that when trained with only n-gram features SNMLM pruning based on a mutual information criterion yields the best known pruned model on the One Billion Word Language Model Benchmark, reducing perplexity with 18% and 57% over Ka...
متن کاملSkip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation
We present a novel family of language model (LM) estimation techniques named Sparse Non-negative Matrix (SNM) estimation. A first set of experiments empirically evaluating it on the One Billion Word Benchmark [Chelba et al., 2013] shows that SNM n-gram LMs perform almost as well as the well-established Kneser-Ney (KN) models. When using skip-gram features the models are able to match the state-...
متن کاملSparse Non-Negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap
We present a new method for estimating the sparse non-negative model (SNM) by using a small amount of held-out data and the multinomial loss that is natural for language modeling; we validate it experimentally against the previous estimation method which uses leave-one-out on training data and a binary loss function and show that it performs equally well. Being able to train on held-out data is...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Association for Computational Linguistics
سال: 2016
ISSN: 2307-387X
DOI: 10.1162/tacl_a_00102